BMDx is an R Shiny graphical interface tool for the dose dependent analysis of transcriptomic data.
BMDx implements a multi-step pipeline that include the following steps: - Set mode - Load phenotype data - Load experimental data, e.g. normalized expression matrix and vst normalized expression matrix with blind=False for microarray and RNA-seq downstream analysis - Filtering - Benchmark dose analysis - Functional enrichment with the FunMappOne tool, e.g. KEGG pathways - Gene pairs analysis - AOP and KE enrichment - Download report
BMDx not only allows the user to compare the results between multiple time points, but also multiple experiments at once. Results along the way are visualised as several types of plots and the output can be downloaded as Excel files or figures in pdf format at multiple steps of the analysis.
The final results include the estimation of effective doses (BMD/BMDL/BMDU) for each dose-dependent gene. These estimates can be used for further analysis to help construct a biological interpretation of the exposure of interest.
BMDx is implemented to work in a dual mode: GLP (Good Laboratory Practice) and non-GLP mode. If GLP mode is selected by the user, all the steps of the analysis will be recorded in a log file. Furthermore, at every step of the analysis the user will be prompted to add information about the analysis. First, the user will be asked to give a description about the aims of the analysis. Next, for every step that the user perform, an explanation for performing the step and the selected values for the parameters will be requested.
BMDx takes as an input a phenotype file and an expression matrix, both provided as an Excel spreadsheet (xlsx). If multiple experiments are included, both files must contain separate sheets for each experiment in corresponding orders. Specific instructions for the file structures are provided below. Sample data can be found here.
The phenotype file is an Excel file containing separate sheets for each experiment. Each sheet contains information about the samples used in the specific experiment. In particular, BMDx requires the spreadsheets to have at least three columns that specify the following characteristics: 1) Unique sample IDs corresponding to the column names in the expression matrix, 2) condition (e.g., dose) and 3) the time points included in the experiment. Each sheet must have the columns (sample ID, condition, and time point) in the same positions.
Sample pheno data
The required parameters are the following:
Once all options are set, load the phenotype data and proceed to the next step.
An overview of the metadata is presented
in the “Metadata data” window.
Overview metadata
The expression matrix is an Excel file with a separate sheet for each experiment. Each experiment involves a unique test system or materials being tested, such as different cell lines or treatments. Note that different doses or timepoints tested within the same system are not considered separate experiments. Sample columns are named with unique sample IDs that match the sample IDs provided in the phenotype file. Gene names must be provided in the first column of each spreadsheet, and each following column specifies the expression values for those genes in each individual sample. The order of the sheets must match the order of the sheets in the phenotype file.
The required parameters are the following:
Load expression data parameters
Once all options are set, import the normalized expression matrix and proceed to the next step. An overview of the gene expression matrix is presented in the “Experimental data” window.
Overview expression data
The number of genes (or molecules) profiled by transcriptomic data is usually high. Thus, in order to reduce the time of the BMDx analysis and focus on relevant genes, different filtering criteria are available:
The user can also decide to skip the filtering.
Anova and trend test parameters
The results of the Anova and Trend test analysis will be displayed in a table with the following columns: - Exp: the name of the experiment - Time: the time point of the experiment - Feature: the name of the gene - Pvalue: the p-value obtained from the analysis - adjPVal: the adjusted p-values - usedFilteringPVal: the p-value used to filter the results. If the adjustment is performed, usedFilteringPVal will be equal to adjPVal, otherwise will be equal to Pvalue.
## Fold-change filtering
results
The results of the filtering by fold-change will be displayed in a table with the following columns:
The benchmark dose (BMD) is the dose or concentration of a substance that corresponds to a specified level of response above or below that observed in a control or background population. The specified level of response within this definition is referred to as the benchmark response (BMR), while the statistical lower confidence bound of the BMD (referred to as BMDL) and the statistical upper confidence bound on the BMD (BMDU) have been typically used by regulatory agencies to set safe levels of exposure. BMD modelling involves fitting the experimental data, in this case, the gene expression values, to a selection of mathematical models, such as linear, second- or third- degree polynomial, an exponential model, hill model, asymptotic regression model, and Michaelis-Menten model. Additionally, when multiple models can be fitted to the same gene, their average consensus model can be also produced. The best model is selected by using a goodness of fit criteria, such as the Akaike information and the goodness-of-fit p-value. Alternatively, the average consensus model can be also used. A predefined response level of interest, the BMR, is identified and the optimal model is used to predict the corresponding dose (BMD) (Abraham et al. 2012). Moreover, the European Food Safety Authority (EFSA) suggest reporting both the lower and upper 95% confidence limit on the BMD called BMDL and BMDU respectively (EFSA Scientific Committee et al. 2017).
\[ f(dose) = \beta_0 + \beta_1 dose \] Where \(\beta_0\) is the control response (intercept), and \(\beta_1\) is the slope.
\[ f(dose) = \beta_0 + \beta_1 dose + \beta_2 dose^2 + \ldots + \beta_n dose^n \] Where \(\beta_0\) is the control response (intercept), \(\beta_1, \dots, \beta_n\) are the polynomial coefficients, and \(n\) is the degree of the polynomial.
\[ f(dose) = \beta_0 + \beta_1 \times (dose)^\delta \] Where \(\beta_0\) is the control response (intercept), \(\beta_1\) is the slope, and \(\delta\) is the power.
\[ f(dose) = \beta_0 + \frac{v \times dose^n}{K^n + dose^n} \] Where \(\beta_0\) is the control response, \(v\) is the maximum response, \(K\) is the dose at which half the maximum response is reached, and \(n\) is the Hill coefficient.
\[ f(dose) = a \times e^{\pm b \times dose} \]
\[ f(dose) = a \times e^{\pm (b \times dose)^d} \]
\[ f(dose) = a \times (c - (c-1) \times e^{\pm b \times dose}) \]
\[ f(dose) = a \times (c - (c-1) \times e^{\pm (b \times dose)^d}) \] Where \(a\) is the control response (intercept), \(b\) is the slope, \(c\) is the asymptote term, and \(d\) is the power.
\[ f(dose, (b, c, d, e, f)) = c + \frac{d-c}{(1+\exp(b(\log(dose)-\log(e))))^f} \] If \(f \neq 1\), the function is asymmetric; otherwise, it is symmetric (on a log scale).
\[ f(dose, (b, c, d, e)) = c + \frac{d-c}{1+\exp(b(\log(dose)-\log(e)))} \] The function is symmetric about the inflection point \(e\).
\[ f(dose, (b, c, e)) = c + \frac{1-c}{1+\exp(b(\log(dose)-\log(e)))} \] The function is symmetric about the inflection point \(e\).
\[ f(dose, (b, e)) = \frac{1}{1+\exp(b(\log(dose)-\log(e)))} \] The function is symmetric about the inflection point \(e\).
\[ f(dose, (b, e)) = 1 - \exp(-b \cdot dose^e) \] This model describes an asymmetric sigmoidal growth curve.
\[ f(dose, (b, d, e)) = d - d \cdot \exp(-b \cdot dose^e) \] Extends Weibull 1.2 by adding a scaling parameter \(d\) that adjusts the upper asymptote.
\[ f(dose, (b, c, d, e)) = c + (d - c) \cdot (1 - \exp(-b \cdot dose^e)) \] Allows both upper (\(d\)) and lower (\(c\)) asymptotes, providing greater flexibility.
\[ f(dose, (b, e)) = \exp(-\exp(b \cdot (\log(dose) - e))) \] A sigmoidal decay function where \(e\) is the inflection point.
\[ f(dose, (b, d, e)) = d \cdot \exp(-\exp(b \cdot (\log(dose) - e))) \] Extends Weibull 2.2 with a scaling parameter \(d\), modifying the maximum response.
\[ f(dose, (b, c, d, e)) = c + (d - c) \cdot \exp(-\exp(b \cdot (\log(dose) - e))) \] Generalizes Weibull 2.3 by incorporating both lower (\(c\)) and upper (\(d\)) asymptotes.
\[ f(dose, (c, d, e)) = c + \frac{d-c}{1+(e/dose)} \] This model increases as a function of dose, attaining the lower limit \(c\) at dose \(0\) and the upper limit \(d\) for infinitely large doses. The parameter \(e\) corresponds to the dose yielding a response halfway between \(c\) and \(d\).
A two-parameter version of the Michaelis-Menten model is obtained by setting \(c=0\).
BMDx implements model averaging through an AIC-based combination of models fitted to the same gene’s dose-response data. By calculating Akaike weights from the AIC values of each model, it assigns relative probabilities to the models, reflecting their support from the data. Predictions for benchmark dose (BMD), lower bounds (BMDL), and upper bounds (BMDU) are generated for each model and combined using these weights to produce averaged estimates. This approach ensures that the final predictions account for model uncertainty, integrating information from multiple models while penalizing overly complex or poorly fitting ones.
For every gene with multiple available models, the average model is obtained by weighting the predictions of the models by their AIC values. This ensures that models with a better fit contribute more to the final prediction while models with poorer fits have less influence.
The following steps are followed:
Fitting individual models: Each model is fitted separately to the dose-response data.
Computing AIC values: The Akaike Information Criterion (AIC) values for each model are extracted.
Calculating Akaike Weights:
\[ w_i = \frac{\exp(-\frac{AIC_i - AIC_{min}}{2})}{\sum \exp(-\frac{AIC_i - AIC_{min}}{2})} \]
where \(w_i\) represents the weight assigned to each model.
Computing Weighted Predictions:
\[ \hat{Y} = \sum w_i \cdot Y_i \]
where \(Y_i\) is the prediction from each model.
Model comparison is performed by means of the Akaike Information Criterion (AIC) implemented in the R stats package as follows:
\[ -2 x log-likelihood + k x npar\]
where \(npar\) represents the number of parameters in the fitted model, and \(k=2\) for the usual AIC.
BMDx provides multiple options for handling variance assumptions when fitting dose-response models. Users can specify whether variance should be treated as constant, nonconstant, or model-dependent, or they can allow the tool to infer the appropriate variance assumption based on statistical testing.
BMDx supports four methods for variance estimation:
“infer” (default): The tool automatically determines whether variance is constant or nonconstant using the Levene test for homogeneity of variance. If the test returns a p-value below the chosen significance threshold (e.g., 0.05), variance is classified as nonconstant. Otherwise, variance is assumed to be constant.
“constant”: The model assumes homoscedasticity
(equal variance across all observations). Variance is estimated using
the Mean Squared Error (MSE) of the residuals,
calculated as:
\[
\sigma^2 = \frac{1}{n} \sum (r_i^2)
\] where \(r_i\) represents the
model residuals.
“nonconstant”: Variance is assumed to be heteroscedastic (changing across dose levels). It is estimated using only the residuals from control samples (dose = 0), following the Root Mean Squared Error (RMSE) approach for the control group.
If control_variance is set to "infer", BMDx
applies the Levene test to assess variance homogeneity across groups.
The test evaluates whether population variances are equal (null
hypothesis). If the test result (p-value) is below the selected
significance level, variance is classified as
nonconstant; otherwise, it is considered
constant.
For constant variance, the model estimates variance using the residuals from all samples. For nonconstant variance, only residuals from control samples (dose = 0) are used for variance estimation. If model-based variance is selected, a variance function is fitted to the residuals, and variance predictions are incorporated into a weighted regression model to refine the estimates dynamically.
By default, if no variance option is specified, BMDx will automatically infer the variance type based on statistical testing.
The adverse direction of the models is determined based on the sign of specific model parameters. This estimation follows different approaches depending on the model type:
Linear Model: The sign of the slope coefficient is used to determine the direction of the adverse response. A positive slope indicates a positive (+1) adverse reaction, while a negative slope indicates a negative (-1) adverse reaction.
Power Model: The sign of the b coefficient is used to assess the adverse direction. A positive b coefficient corresponds to a positive (+1) adverse reaction, whereas a negative b coefficient corresponds to a negative (-1) adverse reaction.
Hill Model: The sign of the v coefficient is used to estimate the adverse reaction direction. A positive v coefficient indicates a positive (+1) adverse reaction, while a negative v coefficient indicates a negative (-1) adverse reaction.
Polynomial, Exponential, and Models from the
drc Package: For these models, a linear regression
is performed, and the sign of the slope coefficient is used to estimate
the adverse reaction direction. A positive slope results in a
positive (+1) adverse reaction, whereas a negative
slope results in a negative (-1) adverse
reaction.
The coefficient of determination (R²) is used to evaluate the goodness of fit of the models. It quantifies the proportion of variance in the observed data that is explained by the fitted model. Different model types use distinct approaches to compute R², as described below:
Models from the drc package:
The R² value is computed using the variance-based
formula:
\[
R^2 = 1 - \frac{\text{Var}(\text{residuals})}{\text{Var}(\text{observed
values})}
\] where the residual variance is divided by the variance of the
observed response values, providing a measure of how well the model
accounts for the data variability.
Exponential Model:
The R² value is computed using the
modelr::rsquare() function, which applies a standard
squared correlation approach between observed and predicted values,
ensuring consistency in nonlinear regression evaluation.
Hill Model:
The R² value is computed similarly to the exponential
model, using modelr::rsquare(). This function assesses the
proportion of explained variance by comparing fitted values with
observed responses.
Linear Model:
For linear regression models, R² is derived from the
standard formula used in lm() results:
\[
R^2 = 1 - \frac{\sum (y_{\text{obs}} - y_{\text{pred}})^2}{\sum
(y_{\text{obs}} - \bar{y})^2}
\] where \(y_{\text{obs}}\)
represents the observed response, \(y_{\text{pred}}\) the predicted values, and
\(\bar{y}\) the mean of the observed
responses. The R² value is extracted directly from
summary(lm())$r.squared.
Polynomial Models:
Similar to the linear model, polynomial models use the standard
R² computation from
summary(lm())$r.squared. Since polynomials are fitted using
linear regression techniques, their goodness-of-fit evaluation follows
the same approach.
Power Model:
The R² value for power models is computed using
modelr::rsquare(), as in the exponential and Hill models.
This function provides a robust measure of the proportion of variance
explained by the fitted model.
An R² value close to 1 indicates that the model explains a high proportion of the variability in the observed data, while an R² value close to 0 suggests that the model does not provide a good fit. It is important to note that R² does not inherently account for model complexity, so it should be interpreted alongside other model selection criteria, such as AIC.
The BMR can be computed by means of three different methods: standard deviation, relative and absolute. The standard deviation method for normal distributed responses it is defined as
\[ \dfrac{| m(BMD) -m(0) |}{s(0)} = BMRF \]
Where \(s(0)\) is the standard deviation of the controls, and \(m(0)\) is the predicted value of the response at the control level. The BMRF is the multiple of the standard deviation, whose default value1.349 corresponds to a 10% of difference with respect to the controls. Thus, the BMD (benchmark doses) is the estimated dose, whose predicted value \(m(BMD)\) corresponds to the desired BMRF.
The Benchmark Dose Lower Bound (BMDL) and Benchmark Dose Upper Bound (BMDU) values are estimated using confidence interval methods as suggested by Gaylor et al. (1998). This approach provides an uncertainty range for the Benchmark Dose (BMD), ensuring that the estimated dose-response relationship accounts for statistical variability.
The estimation of BMDL and BMDU is based on the confidence interval of the predicted dose-response curve. Given a predefined confidence level (e.g., 95%), the lower (BMDL) and upper (BMDU) bounds are determined by interpolating the benchmark response (BMR) within the confidence interval of the fitted model.
BMDL and BMDU are computed using the
predict() function on a range of dose
values spanning the observed data. The interpolation is performed using
the stats::approx() function to estimate
the benchmark dose bounds at the benchmark response (BMR) level. The
estimated values are then stored as BMDL and
BMDU in the model output.
This methodology is aligned with the U.S. Environmental Protection Agency (EPA) recommendations for benchmark dose estimation, following the framework outlined by Gaylor et al. (1998) (link).
The BMD analysis requires the following parameters:
Number of cores: The number of cores that are used during the analysis
Maximum Iterations: Is the maximum number of iterations allowed as a convergence criteria
Data type: Is a string character describing the type of response. Possible values are continuous (default) and binomial
Deviation type: Is a string character describing the type of deviation computed. Possible values are standard (default) and relative and absolute.
BMR factor: Is the number of standard deviation at which the BMD is defined. The default value is 1.349, which corresponds to a change of 10% with respect to the controls. For further information refer to the following papers: Thomas, Russell S, Bruce C Allen, Andy Nong, Longlong Yang, Edilberto Bermudez, Harvey J Clewell, and Melvin E Andersen. 2007. “A Method to Integrate Benchmark Dose Estimates with Genomic Data to Assess the Functional Effects of Chemical Exposure.” Toxicological Sciences 98 (1): 240–48. https://doi.org/10.1093/toxsci/kfm092; Filipsson, Agneta Falk, Salomon Sand, John Nilsson, and Katarina Victorin. 2003. “The Benchmark Dose Method–Review of Available Models, and Recommendations for Application in Health Risk Assessment.” Critical Reviews in Toxicology 33 (5): 505–42. https://doi.org/10.1080/10408440390242360.
Confidence interval: The statistical confidence
limit applied to the BMD estimated from the models.
Variance type: Is a string character describing the type of variance of the data. Possible values are constant (default), non constant, model and inferred.
Significance threshold for levene test: In case of variance type equal to inferred, a levene test is used to test the assumption of constant or non constant variance. This parameter specify the threshold for the significant p-value for the test. Default value is 0.05.
Lack-of-fit p-value: Threshold used to identify good fitted models. Default 0.1
Select models: Predefined subselection of the available models. Possible values are custom, regulatory, degree of freedom, all. The user can use any of the available subset of models or select the models himself from the checkbox list. Regulatory models include models of interest for regulatory agencies. Degree of freedom models comprise models with a degree of freedom less than n-1, considering n is the number of doses tested. Custom allows for manual selection of desired models.
BMD parameters.
The results of the BMD analysis will be reported in a tabular format with the following column names:
The table reports all the models that can be estimated for every gene.
BMD results
The user can discard the model fitted and displayed in the BMD analysis result table by means of the following criteria:
Filtering of BMD results.
The selection of the final optimal model for every gene can be selected as the one with the lowest AIC value, or as the average consensus model. The consensus model should be computed only merging the models that pass the filtering criteria.
When clicking on a row of the BMD result table, the model fitted to the response data of the corresponding gene is visualized along with the corresponding effective doses. Furthermore the model formula with the estimated parameters are also visualised.
BMD model fitting of a specific gene.
#Compare different experiments
Once the models have been fitted their distribution across different experiments and different time-points can be explored by different visualizations.
The density distribution of the BMD values can be investigated by means of histograms. The user can select the relevant variable of interest and different grouping variables. Filtering can be also performed to plot only a subset of the results.
The following parameters are required:
Parameters for plotting BMD histogram.
BMD density
The distribution of the lack-of-fit p-values, AIC, and \(R^2\) can be investigated by means of histograms. The user can select the relevant variable of interest and different grouping variables. Filtering can be also performed to plot only a subset of the results.
The following parameters are required:
Parameters for plotting the distribution of the lack-of-fit p-values
Lack-of-fit.
The relationship between the BMD/BMDL values can be investigated by means of scatterplots. The user can select the relevant variable of interest and different grouping variables. Filtering can be also performed to plot only a subset of the results. The following parameters are required:
Parameters for plotting the relationship between BMD/BMDL.
Relationship between the BMD/BMDL values
The proportion of fitted models can be investigated by means of pie charts. The user can select different grouping variables. Filtering can be also performed to plot only a subset of the results. In this plot the variable of interest is fixed to the models and cannot be changed by the user.
The following parameters are required:
Parameters to compute the proportion of fitted models.
Proportion of fitted models to compute BMD.
The proportion of dose-dependent genes by time points can be investigated by means of barplots. The user can select the grouping variable. Filtering can be also performed to plot only a subset of the results. In this plot the variable of interest is fixed to the time point and cannot be changed by the user.
The following parameters are required:
Parameters for plotting dose-dependent genes
Proportion of dose-dependent genes by time points
The distribution of the bmd values at the whole transcriptome level can be investigated by means of the ECDF plot. This plot is showing the percentage of genes (y-axis) with a bmd lower than a certain value (x-axis). The user can select the relevant variable of interest and different grouping variables. Filtering can be also performed to plot only a subset of the results. The following parameters are required:
Parameters for plotting ECDF.
Distribution of the bmd values at the whole transcriptome level.
The number of genes shared by the different experiments can be investigated by means of an upset plot. The user can select the relevant variable of interest and different grouping variables. Filtering can be also performed to plot only a subset of the results. The following parameters are required:
Parameters for plotting upset plot.
Number of genes shared by the different experiments.
The aim of this analysis is to identify genes who are more often dose-dependent across different experiments or time points. The genes are ranked according to their occurrence frequency across the different experiments. Genes with a frequency below a certain threshold can be removed from the analysis. The tool compute the frequencies of the genes and plots them using a horizontal barplot, where genes are ranked based on their frequency. A GSEA is performed by means of the gprofiler package on the ranked list of genes.
The following parameters are required:
Lollipop gene frequency
gsea
FunMappOne is a user friendly tool for the comparison of functional enrichment analysis af multiple experiments. The tool is publicly available at. Please refer to the FunMappOne paper for more details.
The FunMappOne tools is embedded into the BMDx as a module for the functional characterization of the dose-dependent genes.
Once the FunMappOne analysis is performed a heatmap with the enrichment results for each experiment is visualized. The enrichment heatmap shows each time point in each experiment as a separate column. Pathways are shown in the rows, and coloured boxes indicate enrichment of the pathway at the specific condition. When plotted values are shown in chromatic scale, the colour of the box changes according to the value. For example, in the figure below, the colour indicates the mean BMD value of the genes contributing to the enrichment of the pathway, showing the difference in the BMD values between the two experiments.
Heatmap of enrichment results from FunMappOne
A range plot can be visualised to show the span of values of the BMDL, BMD and BMDU of the different enriched pathways. The following parameters are required:
Range plot BMD, BMDL, and BMDU values
Gene BMD in a Pathway tab shows all pathways with their enrichment p-values. Selecting a row of the table plots a graph under the table with all the genes in the pathway, their BMD values as well as the lower and upper confidence bound BMD. Note! Deselect the row before selecting the next row to only include the genes in the desired pathway.
BMD values of genes belonging to a selected pathway
The genes mapped to each term are shown as a table in Pathways table tab. Time point for which data is shown can be changed from the time point drop menu. The tables can be downloaded as a single Excel file by clicking download.
Table including enriched pathways
Finally, the genes in different pathways can be explored in the form of a heatmap. The drop menus allow for the specification of the hierarchy level, the terms belonging to that level and specification of values shown on the heatmap. The heatmap then shows the genes mapped to the selected term on rows with the experiments and time points as columns, and coloured boxes indicating the value specified in the Show values drop menu.
BMD of altered genes across experiments
This analysis is used to compare the expression profiles of pairs of genes across the doses. Users can filter by specific columns, experiments, and timepoints. Given the optimal model fitted to every gene, N predicted expression values are computed. These are used to compute the Pearson correlation between each pair of genes. The parameters required for the analysis are the following:
- Number of predictions:
The number of predicted values for each gene. Default = 1000 -
Number of cores: The number of cores used for the
analysis - Organism: The organism for which the
expression values have been computed (available values are human, mouse
and rat)
The result of the analysis are visualized in a tabular format with the following columns:
Gene pairs correlation
Furthermore the genes will be clustered by performing a hierarchical clustering on top of the correlation matrix. For this the Number of clusters parameter is required. Default value = 2. The clustering also includes annotations related to network properties of those genes within the PPI, such as degree, closeness, and eigenvector, as well as information on whether a gene is a transcription factor (TF) or a microRNA.
Genes belonging to each cluster can be
visualized as a table.
Genes in each cluster
For each cluster, an enrichment analysis will be performed by means of the R package gprofiler2. For this the Correction method parameter is required. Default value = “fdr”. Enrichment results can be visualized in a gost plot or bubble plot.
Gost enrichment plot
Enrichment results
can be visualized and exported as a table.
Gost enrichment table
For all the statistically significant pathways identified with the FunMappOne tool or KE enrichment for the selected experiment and time point, the gene expression profile of the genes annotated in the pathways can be plotted and compared.
Gene expression profile of genes belonging to the same pathway
Gene expression profile of genes belonging to the same KE
User can also visualize how these genes are associated to each other within the PPI, taking into account their properties (degree and closeness) and characteristics, such as whether they are TF or miRNA.
Clustering of genes correlated in the citokine-citokine pathway
Clustering of genes correlated in the same KE
The PPI network of the selected organism (human, mouse, or rat) will be used to analyse the pairs of correlated genes. A subgraph of the PPI network is created by selecting all the edges incident on pairs of correlated genes. The genes in the subgraph are ranked according to their degree (number of edges incident on the node). The gprofiler2 package is used to perform a GSEA analysis over the genes ranked by degree centrality.
Gene pairs correlation
The parameters used for this analysis are the following: - Correlation interval: Used to filter pairs of genes with correlation under this threshold - Correction method: Method used to correct the p-value of the GSEA analysis
Gene pairs correlation
Enrichment analysis is performed using Fisher’s exact considering recent annotation of genes to key events (KE) and adverse outcome pathways (AOPs) from AOP-Wiki (https://aopwiki.org/), for more details. The following parameters are required to run the enrichment:
User has the option to add a background file or use existing background from BMDx app by selecting “use annotate gene as background”.
There are options to select that allow you to focus on KEs and AOPs that are significant.
AOP and KE enrichment parameters.
In this section, KE enrichment results can be visualized as a table. The results are exportable. In addition, POD distribution values can be visualized by KE type, KE level, and across KE annotation biological systems as box plots. The results are exportable.
KE enrichment table results.
BMDL by KE type.
BMD by KE type.
BMDU by KE type.
BMD by KE level.
BMD by KE annotation to biological systems.
Within the KEr window, results can be visualized in a KE-KE network. Select the experiment you are interested in and create a KE-KE network. If you want to focus on KEs associated with specific SSbD categories related to safety, you can choose among the SSbD categories. The default visualization “enlarges KE”, meaning it considers both enriched KEs and non-enriched KEs connected to enriched KEs, providing a comprehensive view of the network. This MOA reconstruction is based on a recent approach that prioritizes plausible molecular initiating events (MIEs) and adverse outcomes (AOs) and maps them back to the AOP network. For more details, refer to Del Giudice et al. (2024). This metric considers the proximity to an enriched KE on the KE-KE network and the presence of significantly altered genes in the event’s annotated gene sets. Non-enriched KEs are colored in gray, while enriched KEs are colored in red. Different shapes represent the types of KEs. The user can also filter based on the max path length, number of AOs and MIEs to focus on a specific section of the network. Clicking on events in the network allows you to see more information about the specific event.
MOA reconstruction of lung fibrosis.
KE-KE networks of different conditions can also be compared by selecting common elements between conditions/experiments or unique elements. For example, “Network A minus Network B” shows what is unique to Network A.
Comparison of KEKE network across conditions.
The AOP enrichment table presents all enrichment results. The results are exportable.
AOP enrichment table results.
Moreover, the AOP section includes a mechanistic visualization that contextualizes KEs within an AOP and it integrates with data from PPI to illustrate how they are interconnected through molecular interactions.
The top relevant genes have been ranked according to multiple network
properties that reflect their connectivity within the PPI network. To
visualize the top most relevant genes based on a user defined number,
the user should double click on the enriched KEs in the figure at the
bottom of the page. These genes will then appear. KEs that share those
genes will be considered together and separated by semicolon. Each time
a different id of interest is selected, it is important to reinitialize
the clustering to ensure accurate and updated results.
Enriched KEs of the AOP Angiotensis II leading to fibrosis.
Top genes within the enriched KEs of the AOP Angiotensis II leading to fibrosis.
Leading genes found in the enriched KEs of the Angiotensin II AOP linked to fibrosis, with info boxes detailing each gene characteristics.
In addition, the tool allows to visualize the distribution of BMD values across AOPs grouped by SSbD categories and reports the number of AOPs per category.
Distribution of BMD values grouped by SSbD categories
Once the enrichment analysis is completed, users can visualize the enriched AOPs using a bubble plot or range plot. They have the option to display either AOP names or AOP IDs on the y-axis. The font size, as well as the plot’s height and width, can be adjusted for optimal export as a PDF. Additionally, AOP enrichment results can be exported. Users can also choose to focus on a specific range of AOPs that fall within particular SSbD categories related to safety, for example, “Carcinogenicity”.
AOP fingerprint bubble plot.
AOP fingerprint range plot.